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ABSTRACT 

The Two-Micron All-Sky Survey (2MASS) has mapped out the low-redshift Universe down to K s ~ 14 mag. 
As its near-infrared photometry primarily probes the featureless Rayleigh-Jeans tail of galaxy spectral energy 
distributions, colour-based redshift estimation is rather uninformative. Until now, redshift estimates for this 
dataset have relied on optical follow-up suffering from selection biases. Here we use the newly-developed 
technique of clustering-based redshift estimation to infer the redshift distribution of the 2MASS sources re¬ 
gardless of their optical properties. We characterise redshift distributions of objects from the Extended Source 
Catalogue as a function of near-infrared colours and brightness and report some observed trends. We also 
apply the clustering redshift technique to dropout populations, sources with non-detections in one or more 
near-infrared bands, and present their redshift distributions. Combining all extended sources, we confirm with 
clustering redshifts that the distribution of this sample extends up to z ~ 0.35. We perform a similar analysis 
with the Point Source Catalogue and show that it can be separated into stellar and extragalactic contributions 
with galaxies reaching z ~ 0.7. We estimate that the Point Source Catalogue contains 1.6 million extragalactic 
objects: as many as in the Extended Source Catalogue but probing a cosmic volume ten times larger. 

Subject headings: galaxies: distances and redshifts - methods: data analysis 


1. INTRODUCTION 

The Near-infrared (NIR) sky provides a view of the Uni¬ 
verse less affected by dust obscuration than in the visible 
wavelengths. Observations in this regime significantly re¬ 
duce extinction effects due to the Milky Way as well as the 
self-obscuration of extragalactic sources, particularly when 
viewed edge-on. The Two-Micron All-Sky Survey (2MASS; 
Skrutskie et al. 2006$ has produced the largest compilation 


of sources detected in the near-infrared (1 -2.4/mi) contain¬ 
ing over 470 million sources in the Point Source Catalogue 
(PSC; primarily stars) and 1.6 million sources in the Ex¬ 
tended Source Catalogue (XSC; primarily galaxies, [Jarrett] 
jet al.||2000$. The full-sky coverage makes this survey par¬ 
ticularly well suited for galaxy studies and cosmological tests 
limited by cosmic variance. However, such extragalactic ex¬ 
periments typically require the knowledge of galaxy redshifts; 
unfortunately, photometric redshift estimation based only on 
near-infrared data is difficult. At low redshift, most flux re¬ 
ceived from galaxies originates from the Rayleigh-Jeans tail 
of stellar photospheres, a featureless spectral energy distri¬ 
bution. This limits the information available to discriminate 
redshift, age or metallicity. 

Photometric redshift estimates based solely on near- 
infrared dat a have been attempted (Jarrett 2004; Koch anek| 
|et al.||2003|). Since no colour variation is expected from 
Rayleigh-Jeans tails, redshift estimation is typically per¬ 
formed by assuming a fixed K-band galaxy luminosity and 
directly inferring distance based on observed flux. Such crude 
estimates typically lead to redshift errors of order 50%. Re¬ 
cently, several authors have cross-matched 2MASS galaxies 
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2010$ using the SupcrCUNlVIUN sur- 
) to estimate redshifts. jBi 1 icki et al.| 


(2014$”also use SuperCOSMOS as well as W 
tions in the infrared to better constrain the redshift. Gener¬ 
ally, they show that for typical 2MASS sources, when optical 
photometry is available, the near-infrared data does not sub¬ 
stantially improve the accuracy of redshift estimation. While 
optical data brings in useful information on redshift estima¬ 
tion, this approach suffers from one limitation: a selection 
bias is introduced by the requirement of optical data as cer¬ 
tain near-infrared sources will not be detected in the optical; 
such sources are referred to as optical “dropouts”. 

Direct redshift estimation based on near-infrared spectro¬ 
scopic observations also faces important limitations, arising 
from the discontinuous transparency of the Earth’s atmo¬ 
sphere together with the small number of strong spectral fea¬ 
tures located in the near-infrared (see Figure [T}. Similar to 
the photometric approach, in practice the characterization of 
near-infrared source redshifts has relied on additional spectro¬ 
scopic data from optical wavelengths. The 2MASS Redshift 
Survey (2MRS, Huchra et al.||2012$ has produced optically- 
based spectroscopic redshifts for about 45000 flux-limited 
sources. This corresponds to only 3% of its parent extended 
source catalogue and is restricted to the nearby Universe (z < 
0.1). Similarly, the 6dFGS redshift survey provides a deeper 
completeness of sources over the southern sky (Jones et alTj 


2009$. Another, independent, source of spectroscopic red- 
sFift information comes from cross-matching 2MASS sources 
to the Sloan Digital Sky Survey. At K s < 14, about 90% of 
the 2MASS extended sources overlapping the SDSS footprint 
have optical spectra and redshift measurements (Strauss et ak] 
2002$. The remaining 10% correspond to either sources lost 


in the targeting of the SDSS spectra and to optical “dropouts”: 
sources not detected by the SDSS photometry, either from 
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Fig. 1.— Example spectral energy distributions of galaxies in the NIR. Model galaxy SEDs are produced by the GALEV simulation suite (Kotulla et al.|2009). 
Models with different specific star formation rates (sSFR) are shown with arbitrary offset and redshifted as indicated, with prominent emissionlilies labelled. The 
green curve is representative of elliptical galaxies with older stellar populations, while the blue curve is representative of a starburst. For reference, the 2MASS 
filter response curves are shown in the background. There are only a limited number of emission lines available for any spectroscopic redshift determination. 
Further, the SEDs of the galaxy are smooth, limiting the amount of information available for photometric redshifts. 


extreme reddening or unusual spectral energy distributions. 
Consequently, the SDSS spectroscopic redshift estimates of 
near-infrared sources also suffer from optical selection biases. 

In order to circumvent this limitation affecting both photo¬ 
metric and spectroscopic samples, one needs to estimate the 
redshift distribution of near-infrared sources independently of 
optical detectability. This can be conducted using clustering- 
based redshift estimation. Rather than using spectral energy 
distributions, this technique infers redshifts from angular clus¬ 
tering measurements. This approach does not require any ad¬ 
ditional photometric information on the selected sources. It 
can therefore be applied to the entire 2MASS survey irrespec¬ 
tive of optical counterparts. The use of spatial clustering to 
extract redshift information goes back to |Seldner & Peebles] 
( |1979| ). While the idea had been known for decades, theo¬ 
retical and practical approaches to clustering-based redshift 


estimation followed much later (Landy et al. 1996- Ho et al. 

2008 ;JNewman 

2008, Menard et al. 2013 Schmidt et al.|2013[ 

McQuinn & W 

tite 2013). The feasibility and accuracy of this 


has been investigated by our team (Rahma n et al.| |2015a). 
Here we use this technique to explore and characterise the ex- 
tragalactic sources of 2MASS, in both the extended and point 
source catalogue down to the limiting magnitude of the sur¬ 
vey. 


2. DATA ANALYSIS 
2.1. Clustering-based Redshift Estimation 


Our approach is based on the method introduced in Me- 
|nard et al.|(|2013|>, tested against simulations in Schmidt et al. 
( 2013|), and fully implemented and applied to data in Rah¬ 
man et al.] ( ]2015a[ >. We refer the reader to these papers for 
the detailed description of the formalism and technical con¬ 
siderations. In this section, we briefly re-introduce the main 
concepts. 

We consider two populations of extragalactic objects: (i) a 
reference population for which the angular positions and red¬ 
shifts of each object are known. This population is charac¬ 


terised by a redshift distribution dN r /dz, a mean surface den¬ 
sity n r , a total number of sources N r , and a clustering am¬ 
plitude or bias b r \ and (ii) an unknown population for which 
angular positions are known but redshifts are not. Similarly, 
this population is characterised by the quantities dN u /cL, 

N„ and /?„. The basic principle is that if the two populations do 
not overlap in redshift, their angular correlation is expected to 
be zero (ignorin g gravi tational lensing effects). As discussed 
by |Menard et al. ( |2013) , in the ideal case of an unknown sam¬ 
ple located within a narrow redshift range, one can probe its 
redshift distribution by splitting the reference population into 
contiguous redshift slices 5zi and measuring the angular or 
spatial correlations with the unknown population w ur (0,Zi ) for 
each subsample i. Once a cross-correlation signal is found, 
the amplitude of the redshift distribution is simply obtained 
through the normalization: 


J ckdN u /dt: = N u . 


(1) 


This normalization alleviates the need to characterise the am¬ 
plitude of the clustering bias b u . 

The redshifts derived through this method use informa¬ 
tion solely from a source’s angular position rather than the 
source’s spectral energy distribution (SED). Consequently, it 
can be used for objects detected in only one bandpass. Classi¬ 
cal photometric redshift estimation using SED fitting suffers 
from well-documented limitations, such as a need for a com¬ 
plete set of spectral templates, catastrophic failures, and dust 
reddening effects. Machine learning-based photometric red¬ 
shift estimation has also been used extensively, but has simi¬ 
lar limitations of catastrophic failures and dust reddening, and 
can be significantly affected by incomplete or biased training 
sets. In contrast, the fundamental limitation of clustering red¬ 
shifts is the degeneracy between the redshift distribution of 
the unknown population dN u /dj and the redshift evolution of 
its bias b u (z). As the width of the redshift distribution broad¬ 
ens, this introduces errors in the estimation, but as motivated 
theoretically in [Menard et al.| (|2Q13)> and demonstrated with 
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real data byjRahman et al.|(|2015a), such errors can be made 
sufficiently small for a large range of astrophysical applica¬ 
tions. We note that only the evolution of the bias with red- 
shift imprints an effect on the clustering redshift, and not its 
specific value. In addition, this effect is minimised by sub¬ 
dividing the unknown sources into subsamples that, by con¬ 
struction, will have redshift distributions narrower than that 
of the total sample. This technique has been verified by test¬ 
ing against spectroscopic galaxies ( [Rahman et al.|2015a| , and 
has been used to measure the redshift distribution o f the en- 
tire Sloan Digital Sky Survey photometric catalogue ([Rahman] 
et al. |2Q 15b[ >. The SDSS work demonstrates that clustering 
and photometric redshifts have unrelated systematics, and in 
cases where photometric redshifts are unsuitable, clustering 
redshifts can provide an alternative path to inferring redshift 
information. 

We estimate clustering redshift distributions with the pro¬ 
cedure implemented in (Rahman et al.|(|2015a|). We refer the 
reader to this paper for the details of the implementation. Here 
we simply list the key parameters used in the present anal¬ 
ysis: our reference spectroscopic sample is constructed by 
combining the S DSS Legacy spectroscopic sample extending 
up to z — 0.45 (|Strauss et al.|2002[|Eisenstein et al.||2001 ), 
and the CMASS luminous red galaxies from the Baryon Os¬ 


cillation Spectroscopic Survey extending to z — 0.8 (Padman- 


abhan et al. 20121). This sample has been used to maximize 
the available number of sources for cross-correlation, conse¬ 
quently minimizing measurement noise. For this reference 
sam ple, we use the integrated bias evolution presented in |Rah-| 


man et al. (2015ai and use d\ogb u /d\ogz = 1 for the un¬ 
known population. The error induced by this choice of bias 
is discussed at length in Menard et al.| (|2013|). We measure 
the angular overdensity in an aperture of 0.3 < r < 3 Mpc 
weighted by 9~° 8 . To avoid measuring correlation functions 
over very large angular apertures on the sky and limit cosmic 
variance sampling limitations, we restrict our analysis to red¬ 
shifts greater than z = 0.03. This limit is chosen due to the 
systematics (i.e., excluded areas, varying background densi¬ 
ties) introduced in the measurement of source densities over 
large areas on sky. 


2.2. The 2MASS Surx’ey 

Throughout this work, we use the 2MASS Survey and the 
2MASS Redshift Survey, which we describe here. Near- 
infrared sources detected in the 2MASS survey are divided 
into two samples: the Extended Source Catalogue (XSC) 
containing mostly galaxies, and the Point Source Catalogue 
(PSC) containing mostly stars. Based on the extended 
sources, an additional useful value-added catalogue is avail¬ 
able: the 2MASS Redshift Survey that contains spectroscopic 
redshifts for a flux-limited sample. 

The extended source catalogue contains all sources with 
an angular size larger than the 2MASS PSF (3" in the K s 
band) with a minimum signal-to-noise ratio of 7 in any of 
the three 2MASS bands ([Skrutskie et al.|]20064. In practice, 
however, this results in a required detection in the K-band for 
the majority of the objects. The catalogue consists of extra- 
galactic sources that dominate at high Galactic latitude, to¬ 
gether with a fraction of Galactic sources closer to the mid¬ 
plane. Extended sources consistent with double or multiple 
stellar system are excluded through colour informatior0 At 


Through the use of the “g-score” as described in the 2MASS Ex¬ 



FlG. 2.— The magnitude distribution of the 2MASS Point Source Cata¬ 
logue in the Northern Galactic cap (black) and Extended Source Catalogue 
(blue). For the extended sources, the contribution of sources without J-band 
(green) or H-band detections (yellow), and both H- and J-band magnitudes 
(magenta) are also indicated. The similarity in density of point ane extended 
sources at Ks = 14.0 is purely coincidental. 


its photometric limits, the extended sources contains objects 
without measured magnitudes in the J and/or H bands, which 
we refer to as “dropouts”. The completeness of the extended 
source catalogue is estimated to be > 95% at Ks < 14.0, as 
determined from a comparison analysis to the Virgo Clustei^ 
The 2MASS (Vega-based) magnitudes we use for extender 
sources in this paper are the fiducial photometry with radii set 
by the 20 mag/arcsec 2 isophot in the Ks-band. We present the 
magnitude distribution of the catalogue in Figured 

The remainder of 2MASS detected sources fall into the 
Point Source Catalogue. The vast majority (90%) of the cat¬ 
alogue is located within 30° of the Galactic midplane and is 
dominated by stars. The sources at higher galactic latitude are 
expected to consist of both stars and galaxies with angular di¬ 
ameters smaller than the 2MASS point source function. We 
present the magnitude distribution of a Northern Galactic cap 
sample of the PSC in Figure [2] The PSC is complete down to 
K s < 14.20 

A subsample of 2MASS galaxies has been selected for 
spectroscopic redshift follow-up through the 2MASS Redshift 
Survey (2MRS; Huchra et al.|2012). This sample is 98% com¬ 
plete to Ks = 11.75 over 91 % of the sky. Optical spectroscopy 
of these sources was taken at a combination of observatories 
in both the Northern and Southern hemisphere over 14 years 
to ensure near-all sky coverage. Augmenting their observed 
redshifts with those from complementary surveys, the 2MRS 
Catalogue consists of 44 599 spectroscopically measured red¬ 
shifts. While this work marks a substantive increase in the 
redshift information available for infrared-selected galaxies, 
this amounts to only 3% of the entire 2MASS extended source 
catalogue and does not provide any information for the mini¬ 
mal number of sources not detected in the optical at this flux 
limit. 


planatory Supplement: http://www.ipac.caltech.edu/2mass/ 
releases/allsky/doc/sec2_3b.html 

J Explanatory Supplement to the 2MASS All Sky Data Release 
and Extended Mission Products: http: / /www. ipac . caltech. edu/ 
2mass/releases/allsky/doc/sec6_5bl.html 

° Explanatory Supplement to the 2MASS All Sky Data Release 
and Extended Mission Products: http: / /www. ipac . caltech. edu/ 
2mass/releases/allsky/doc/sec6_5al.html 
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EXTENDED SOURCES 

We now estimate clustering redshift distributions for the en¬ 
tire 2MASS XSC with K s < 14. Since this sample is expected 
to span a redshift range substantially larger than that of the 
2MASS redshift survey, the degeneracy between its galaxy 
bias and redshift distribution may have an appreciable effect 
on the acc urac y of the redshift distribution estimate (as dis¬ 
cussed in § |2.1| >. To minimise it, we first subsample the dataset 
in colour space to reduce the different galaxy types within 
each individual subsample, thus reducing the degeneracy be¬ 
tween the galaxy bias and redshift distribution. We char¬ 
acterise the redshift distribution of objects selected in each 
colour sample, including near-infrared dropouts. The redshift 
distributions of the individual colour-based subsamples will 
be combined to present the global clustering redshift distribu¬ 
tion of the 2MASS extended sources. 


Fig. 3.— A comparison between the spectroscopic redshifts (pink) and 
clust erin g redshifts ( blue) of galaxies selected in the 2MASS Redshift Survey 


Huchra et al. 2012 

. For the clustering redshifts, we use d\ogb / d\ogz = 1, 

which is used in|Kal 

hman et al. 12015a). 


To ensure homogeneous coverage of the reference sample 
and to minimise the effect of both Galactic extinction and 
sources in the 2MASS extended source catalogue, our anal¬ 
ysis focuses on a 4800 square degree area within the Northern 
Galactic Cap, defined by: 

131° <a< 241° 

5° < 5 < 60° (2) 

with fields surrounding bright stars removed. We use no other 
criteria or flags for the sample to avoid placing additional 
(possibly complicated) selection functions on the sample. Ad¬ 
ditionally, we do not make any adjustments for source extinc¬ 
tion from external information. To account for the fluctuation 
of the source density due to cosmic variance and other poten¬ 
tial systematic effects, we estimate the mean source density 
more locally by measuring it independently into 16 equal area 
regions spanning the entire footprint. 

2.3. Testing clustering redshifts 
with 2MASS spectroscopic sources 

We first test the robustness of our clustering redshift tech¬ 
nique in the context of 2MASS data. To do so we estimate 
clustering redshifts for sources in the 2MASS Redshift Survey 
(with flux limit K s < 11.75) where complete spectroscopic 
redshift measurements are available, over the sky footprint 
presented in Eq. [2] This particular set of galaxies can be 
treated as a validation of the technique on a flux-limited sub¬ 
sample of the data. The redshift range spanned by these galax¬ 
ies is sufficiently narrow (Az~ 0.1) that the redshift evolution 
of the galaxy bias can be neglected. We therefore apply the 
clustering redshift technique to the sample as a whole, without 
subsampling in photometric space. Figure [3] shows the distri¬ 
bution of spectroscopic redshifts for this sample with the pur¬ 
ple histogram and the distribution of clustering redshifts with 
the black data points, binned with Az = 0.002. The two distri¬ 
butions show good agreement, verifying the robustness of our 
clustering redshift estimation method. This test also shows 
that clustering redshift estimation is feasible with sparse sam¬ 
ples: the source density of the 2MASS spectroscopic sample 
is about one object per square degree. 

3. CLUSTERING REDSHIFTS OF 2MASS 


3.1. Sampling the near-infrared colour space 

We select sources in square (J-H,H-Ks) colour cells with 
a width of 0.05 mag. The sise of the cells is chosen to en¬ 
sure the maximum number of colour cells while maintain¬ 
ing sufficient on-sky densities to measure the angular cross¬ 
correlation. We note that this binning is narrower than the 
typical colour error of most 2MASS sources. We restrict our 
analysis to cells with source densities greater than 0.06 per 
square degree, which we have found to be the minimum den¬ 
sity required for accurate measurement of mean density; this 
corresponds to about 90% of the 2MASS extended source 
catalogue. The corresponding cells are displayed in Fig¬ 
ure H] In each cell we measure the distribution of cluster¬ 
ing redshifts over the range provided by our reference sam¬ 
ple, (0.03 < z < 0.8). Since we do not detect any signal at 
z > 0.35, we show clustering redshift measurements only up 
to z = 0.4. Each redshift distribution is normalised to unity 
(i.e., f dN/dz dz = 1). Errors are estimated through Poisson 
statistics. 

Spanning the colour space, we can observe redshift distri¬ 
butions from z ~ 0 to about 0.4, with mean redshifts ranging 
from 0.05 to 0.25. For bluer sources we observe narrower 
redshift distributions with a mean redshift of z ~ 0.1 and a 
width of about o z = 0.05. To visually display relationships be¬ 
tween near-infrared colours and redshift, we show the mean 
redshift of each colour cell using a coloured vertical bar. As 
can be seen, the relationship between H-K colour and red¬ 
shift is steeper than that seen for the J—H colour. The J — H 
colour is more degenerate with the stellar composition of a 
galaxy than the H-K colour; the H- and K-bands being fur¬ 
ther down the Rayleigh-Jeans tail in the rest frame, flux from 
galaxies at higher redshift will still arise from the Rayleigh- 
Jeans tail in these bands. Further, as the colours become red¬ 
der, the redshift distributions become wider as well. This phe¬ 
nomenon likely comes from the colour-redshift degeneracy 
between higher redshift, dust-poor galaxies and lower redshift 
dusty galaxies. The results presented in this figure illustrate 
the power of clustering redshift estimation; this technique al¬ 
lows us to characterise redshift distributions in a photomet¬ 
ric space (near-infrared colours) where classical photometric 
redshift techniques fail from the lack of strong correlation be¬ 
tween colour and redshift (Jarrett][2004j). Our analysis also 
demonstrates that some redshift information can be extracted 
from the near-infrared colours of the 2MASS galaxies. 

In order to verify the validity of our clustering redshift es¬ 
timates, we can compare our results to (incomplete) spectra- 

















5 


Clustering Redshift 


SDSS Spectroscopic Redshifts 



2MASS Extended 
Sources (K < 14) 


I 




0.25 

0.20 

0.15 

0.10 

0.05 



Fig. 4.— The full colour-separated redshift distribution of the 2MASS Extended Source Catalogue with Ks < 14. Each cell represents the redshift distribution 
of the sources within the designated J-H and H-Ks range. Each redshift range spans 0.03 <z< 0.4, and is normalised by the number of sources within the range. 
The mean of each distribution is indicated by the coloured vertical line, with redshift values indicated in the colour bar. The overall redshift of the distributions 
increases with colour, as fiducially expected. The redshift distribution of the 2MASS galaxies with spectra available from SDSS is indicated in the background 
of each cell. The red arrow indicates the direction in colour space where the SDSS spectroscopic sample will be biased towards brighter and more nearby objects 
with respect to the full 2MASS sample. 


scopic observations in the optical. The SDSS legacy spec¬ 
troscopy has observed about 90% of the 2MASS galaxies that 
lie within its footprint. In the same figure we present the spec¬ 
troscopic redshift distributions of sources selected as a func¬ 
tion of near-infrared colours, using thin blue lines. As can 
be seen, for the vast majority of the colour space, clustering 
and spectroscopic redshift estimates are in good agreement. 
This further validates our method and illustrates the level of 
accuracy reachable by clustering redshifts (>95% over the 
full colour space). It is interesting to point out the existence 
of small differences between the spectroscopic and cluster¬ 
ing redshifts, gradually rising toward the upper-right corner 
of the figure. These discrepancies appear to be caused by in¬ 
completeness of the spectroscopic sample. As the NIR colour 
increases, the spectroscopic sample is limited to brighter and 
therefore lower redshift sources, whereas the clustering red¬ 
shift distribution uses all sources. The upper-right corner of 
the figure (high J-H and high H-K ) corresponds to the red¬ 
dest near-infrared sources, which are more likely missing op¬ 
tical detections. The difference between the spectroscopic and 
clustering redshift estimates mostly reflects the selection bias 
introduced by the requirement of optical data, as discussed 
in the Introduction. This illustrates the bias implicit in using 


optical data to characterise the redshift distribution of NIR 
sources and the advantage of clustering redshift estimation. 

Since this analysis maps the magnitude and colour of a 
2MASS galaxy onto a redshift distribution, this information 
can be used as a NIR-only photometric redshift, entirely un¬ 
biased by any models or training sets. We make the data and 
code for the full redshift distributions available onlincQ We 
note that the redshift distribution corresponding to a given cell 
in colour space can be interpreted in two ways: it provides us 
with an estimate of the redshift distribution of the population 
of objects living in that cell or, alternatively, the probability 
distribution function for the redshift of an individual galaxy 
from the 2MASS survey living in that cell. 

3.2. Near-infrared dropouts 

The clustering redshift method produces distance infor¬ 
mation for populations of galaxies with limited or missing 
photometric information. The 2MASS extended source cat¬ 
alogue contains large numbers of sources with magnitudes 
below the flux limit in one or more bands (~ 150 000 at 
K s < 14.5), which we refer to as “dropouts”. The 2MASS 

7 The data and code to access the full redshift distributions is available at 

http://www.pha.jhu.edu/~mubdi/2massz 
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Redshift 


FIG. 5.— The clustering redshift distributions of H-band dropouts (top), 
J- and H-band dropouts ( middle ) and J-band dropouts (bottom). The insets 
indicate the distributions of SDSS spectroscopic galaxies cross-matched to 
the 2MASS dropout populations. We separate the samples into star form¬ 
ing (sSFR > KT 11 yr _1 ; blue), and quiescent (sSFR < 10~ u yr _1 ; red) as 
measured spectroscopically. The SDSS samples are incomplete. 


detection criteria are set by the K-band; consequently, most 
sources in 2MASS have a measured K-band magnitude but 
may be missing J- and/or H-band photometry. We separate 
the dropout populations to determine their redshift distribu¬ 
tion: those with detections in J- and K-band (H-band Only 
dropouts), those with H- and K-band detections (J-band Only 
dropouts), and those with only K-band detections (J- and H- 
dropouts). These sources, with only upper limits to their flux 
in one or more bands, are either challenging to or cannot be 
characterised by photometric redshift methods. We present 
the clustering redshift distribution of these populations in Fig¬ 
ure 0 

The redshift distributions of these populations have an up¬ 
per redshift bound consistent with the fully detected sources 
in the 2MASS extended source catalogue; they have little 
to no redshift signal beyond z > 0.4. Since the dropout 
populations have photometry below the flux limit of one or 
more bands, the populations tend to be at higher redshifts 
(0.2 < z < 0.3) as closer sources would be above the limit 
in all 2MASS bands; nearby objects analogous to these pop¬ 
ulations would appear as sources with extreme colours. The 
composition of the dropout populations would differ greatly 
if the flux limits of the survey were different. 

Cross-matching against SDSS sources, we can explore the 
composition of these 2MASS dropout populations through the 
fraction of sources with optical spectroscopy. We note that 
only 50-70% of the dropouts have spectroscopic observations 
within the SDSS footprint, and that at z > 0.15, the selection 
is incomplete and biased towards quiescent galaxies by selec¬ 
tion. We compare the spectroscopic cross-matched sources 
with the clustering redshift distributions of the full popula¬ 
tions in the insets of FigureB] The cross-matched sample have 
an overall lower redshift distribution than the entire dropout 
populations. This is expected since the spectroscopic sources 
are typically flux limited, thus missing the more distant and 
fainter sources. Consequently, we expect the incompleteness 
between the full and cross-matched populations to arise at 
higher redshifts. We separate the cross-matched samples into 
star-forming and quiescent populations through specific star 
formation rates, placing the boundary at sSFR = I 0” 1 yr -1 as 
measured by Tremonti e t all] ( [2004| >. We demonstrate that all 
dropout populations come from a combination of star-forming 
and quiescent populations, indicating that there is no single 
spectral feature or emission process that causes any given 
dropout population. 

While the higher redshift population of spectroscopic 
sources will be biased towards quiescent galaxies, the lower 
redshift population (z < 0.1) will be complete at the magni¬ 
tudes of the sources in the optical. At low redshift, the sources 
are dominated by star forming galaxies, with quiescent galax¬ 
ies contributing a larger fraction as redshift increases. 

Star forming galaxies will have the strongest emission lines, 
such as the Paschen lines that dominate the NIR spectra, 
leading to the largest deviation from continuum-dominated 
colours (Figure [TJ. For fainter sources, this potentially leads 
to a dropout in one or more of the bands. In particular, the 
redshift distribution of star-forming galaxies from the spec¬ 
troscopic cross-match is consistent with the Paschcn-o: emis¬ 
sion line, the strongest in the NIR, being redshifted through 
the K-band filter. 

For quiescent galaxies, we infer that the differences in flux 
limits between the J-, H-, and K-bands combined with the 
galaxy’s continuum emission lead to galaxies falling out of 
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one or more bands. This can be caused by the galaxy’s ex¬ 
tinction and star formation history, all of which imprint an 
SED-altering signal on the continuum emission of a quiescent 
galaxy. 

3.3. Global Redshift Distribution 

We now estimate the full redshift distribution of the 2MASS 
extended source catalogue by summing the distributions 
of each colour-magnitude selected sample, along with the 
dropout populations, and weighting the population by the 
number of sources in each sample. We present this global red¬ 
shift distribution in Figure [6] We break the distribution into 
flux-limited samples to show the magnitude evolution of the 
catalogue. While the 2MASS redshift survey, which roughly 
corresponds to a magnitude limit of Ks = 11.75, probes only 
4 x 10 4 sources extending to z < 0.05, our clustering redshift 
analysis allows us to measure redshift distributions for the en¬ 
tire extended source catalogue (over 10 6 objects), which are 
found to span a redshift range reaching z ~ 0.35. Our red¬ 
shift distribution estimates include the contributions of both 
sources without SDSS spectroscopy, corresponding to about 
10% of the extended source catalogue, as well as near-infrared 
dropouts which comprise another 10% of the total sample. 
Our analysis also allows us to show how the shape of the 
redshift distribution changes with limiting magnitude, as in¬ 
dicated in Figure [6] 

4. CFUSTERING REDSHIFTS OF 
2MASS POINT SOURCES 

We now estimate the clustering redshift distribution for the 
2MASS point source catalogue, with K s < 14. We use this 
magnitude limit to remain consistent with the analysis of the 
extended sources. With a PSF of three arcseconds, we ex¬ 
pect a significant fraction of galaxies to be unresolved. The 
clustering redshift method can be used to (a) determine the 
fraction of extragalactic sources in the point source catalogue 
and (b) estimate their redshift distribution. 

4.1. Method 

Galactic stars are not expected to give rise to any spatial 
cross-correlation signal with extragalactic sources. The set of 
cross-correlations with references objects can, therefore, be 
used to probe the redshift distribution of the extragalactic con¬ 
tribution of the sample (galaxies and/or quasars). However, in 
order to estimate the number of extragalactic objects, we can 
no longer rely on the normalization described in Equation |T] 
since the total number of sources includes both extragalactic 
and Galactic objects. To estimate the fraction of galaxies / ga i 
in the point source catalogue, we can take several approaches. 
We list them below, ordered by the amount of external infor¬ 
mation or assumptions required: 

• The number of sources contributing to the measured 
overdensity contains information about the number of 
extragalactic objects. If one considers angular aper¬ 
tures smaller than the mean separation between refer¬ 
ence objects, the excess quantity of unknown objects 
around each reference source provides us with a min¬ 
imum estimate for the number of extragalactic objects 
within the sample. Taking into account the size of the 
unknown sample, this can then be used to estimate a 
minimum / ga i without the use of any assumption. If the 
reference sample covers a wide enough redshift range. 


this technique can be used to test whether a given sam¬ 
ple contains extragalactic sources. 

• Alternatively, assuming that the redshift distribution of 
the sources is a smooth function, its measured scatter 
can be used to infer the number of objects contributing 
to the signal. For example, if / ga i = 1 then all objects 
in the sample will contribute to the measured cross¬ 
correlations as a function of redshift, and consequently, 
the relative Poisson noise on the estimated dN/dz will 
be low. If, for the same redshift distribution, / ga i de¬ 
creases, then a smaller number of extragalactic objects 
will contribute to the cross-correlation signal and its 
scatter will increase systematically. Information on the 
fraction / ga i of sources contributing to the clustering 
signal can therefore be extracted from the measured 
scatter of the inferred redshift. 

• To get a rough estimate of the fraction of extragalac¬ 
tic objects / ga i in the sample, one can compare the am¬ 
plitude of the measured cross-correlation function to 
the typical amplitude of galaxy correlation functions 
of similar sources at a similar redshift. The ratio be¬ 
tween the two can be used as an indicator of the dilu¬ 
tion factor due to the presence of Galactic sources in the 
sample which do not contribute to any clustering signal. 
The dilution factor can be converted into the fraction of 
sources (1 —/gai) that do not contribute to the correlation 
signal. The accuracy of this approach is limited by the 
lack of knowledge of the clustering amplitude (or bias) 
of the extragalactic objects from the unknown sample, 
which depends on galaxy type and redshift. 

• If one can obtain some information on the redshift de¬ 
pendence of the clustering amplitude (or bias) of the un¬ 
known sources, one can more precisely relate measured 
cross-correlation functions to excess number counts. 
Assuming that the extragalactic sources probed by the 
point source catalogue with given colours are of similar 
type as the objects probed by the extended source cata¬ 
logue, one can first estimate the amplitude of their spa¬ 
tial correlations with the reference sample normalised 
by the expected biases, which we denote w'. If the bi¬ 
ases are properly estimated, we expect the ratio r be¬ 
tween the number of extragalactic objects contributing 
to the cross-correlation signal and w' to be roughly con¬ 
stant. One can first verifying that r depends weakly 
on photometric parameters and redshift. Then, one can 
use its value to convert the measured correlation func¬ 
tions for the point source catalogue to number counts 
and therefore obtain an estimate of / ga i. 

4.2. Application to 2MASS 

Our goal here is only to provide a rough estimate of the 
fraction / ga i of galaxies in the point source catalogue. A pre¬ 
cise characterization of / ga i and comparison between different 
estimation methods is beyond the scope of the present analy¬ 
sis. Among the different approaches presented above we will 
use the latter one; we will assume that the redshift evolution 
of the clustering properties of the point and extended sources 
are similar. We note that violations of this assumption only 
weakly affect the final redshift distribution estimate. 

Before applying the corresponding estimator, we select a 
region of the sky that is more favourable to our task. The frac- 
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Fig. 6.— The total redshift distribution of the 2MASS extended source catalogue as a function of magnitude (in blue). The redshift distribution of 2MRS 
sources is shown for comparison in the magenta. 
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Fig. 7.— The clustering redshift distribution of the 2MASS point sources is presented in magenta with the total redshift distribution of the XSC is shown 
in blue for comparison. There are as many extragalactic sources in the point source catalogue as there are in the XSC. In purple, we present the total redshift 
distribution of all extragalactic sources in 2MASS. One noise-dominated point at z = 0.28 on the point source distribution has been filtered and interpolated over. 


tion of extragalactic sources present in the point source cata¬ 
logue is expected to be a strong function of galactic latitude. 
To minimise the stellar contribution, we limit our clustering 
redshift analysis to a high-galactic latitude footprint defined 
by: 


O 


chosen for computational convienence. This area corresponds 
to a Galactic latitude of about 65°. The noise level of this 
measurement is expected to be greater than that obtained with 
the extended source catalogue, as Galactic sources only con¬ 
tribute to shot noise. To maintain a comparable signal-to- 
noise ratio, we use larger redshift bins for this measurement 
than in in the previous results (Az = 0.02). 

For each (J — II .II — K) co lour cell of the 2MASS extended 
source dataset ( i c , see §3.1 1 , we first estimate w', the clus- 


160° <a< 180‘ 

20° < S < 5° 


( 3 ) 
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tering amplitude with our reference objects normalised by the 
expected biases: 


*4 c(z> */) = 


nyfc if) 

b u (z)b r (z ) ’ 


(4) 


where /?„ and /; r are the galaxy bi ases of the unknown and ref¬ 
erence samples. As discussed in £ |2.1[ we use a measured bias 
for the reference sample, whereas we assume dlog/?/dlogz = 
1 for the 2MASS unknown samp les, whic h we showed to be 
a reasonable assumption in ^2.3 and jj |3.1| Next, we compute 
the ratio between the number counts of unknown sources and 
the normalised clustering amplitude: 


rxsc(^) 


N(z,i c )/N,ot(i c ) 

Ksc(z,ic) 


(5) 


As expected, we can verify that the value of r(z,i c ) is a rela¬ 
tively weak function of colour and redshift: over all the colour 
cells and redshift bins for which a clustering signal is de¬ 
tectable, we find (r) = 1.05 and a r = 0.25. Using the average 
value of r, we can then convert the measured clustering am¬ 
plitude as a function of redshift for each colour cell to obtain 
an estimate of the fraction of galaxies in the point source cat¬ 
alogue. As r appears to be weakly colour dependent, we can 
directly add the contribution from all redshift bins: 


fgai = (r)xY^ w'psciZj) Azj . (6) 

j 


Over the entire redshift range provided by the spectroscopic 
reference sample (0.03 < z < 0.8), we estimate that 


[ 7 ] We find a non-zero signal^] from the lowest redshifts up to 
Z ~ 0.8. As expected, the noise level of the measurement is 
significantly greater than the earlier work with the extended 
source catalogue. We find the bulk of the extragalactic point 
sources to have a redshift distribution similar to that of the ex¬ 
tended source catalogue]^] This indicates that, even at low red¬ 
shift, the extended source catalog alone is incomplete in sam¬ 
pling the total galaxy population. In addition, we observe a 
tail extending significantly beyond the highest redshifts of the 
extended sources around z = 0.3. This maximum redshift is 
therefore not limited by brightness considerations but that of 
angular size. Adding the contribution from both the extended 
and point source catalogues, we find a relatively smooth red¬ 
shift distribution continuously declining from the lowest red¬ 
shift probed by our analysis, z = 0.03 up to z ~ 0.7. We note 
that the cosmic volume probed by the point source catalogue 
is about ten times larger than that of the extended source cat¬ 
alogue. 


5. CONCLUSIONS 

Using clustering-based redshift inference, we have explored 
the redshift distribution of all sources from the Two-Micron 
All-Sky Survey survey, through the extended and point source 
catalogue containing 1.6 and 470 million objects across the 
entire sky. As its near-infrared photometry primarily probes 
the featureless Rayleigh-Jeans tail of galaxy spectral energy 
distributions, colour-based redshift estimation is uninforma¬ 
tive. 

We applied the clustering redshift technique following the 
implementation presented in |Rahman et al.| (|2015a|) to about 
5,000 square degrees of 2MASS data. Our main results are as 
follows: 


/"gal — 0.10 . (7) 

This fraction is relatively low. However, given that the 
point source catalogue density (in this high-galactic lati¬ 
tude region, after removing sources identified as extended) is 
337 obj/deg 2 , this leads to an extragalactic point source den¬ 
sity of 33 obj/deg 2 . This source density is as high as that of 
the extended source catalogue (31 obj/deg 2 ). Unlike galactic 
sources, we anticipate no variation in the density of the extra¬ 
galactic sources over the sky. We can compare our clustering- 
based estimate of / ga 1 to a more direct estimate of this value 
by computing the fraction of 2MASS point sources classified 
as extended objects (galaxies) in SDSS. Doing so, we find that 


/gal ~ 0.07 (using SDSS photometry), 


( 8 ) 


a number in good agreement with the above estimate. We 
note that this estimate is based on optical detections and is 
consequently only a lower limit on the intrinsic value. While 
our method does not determine which of the galaxies in the 
catalogue are extragalactic, there are approaches that seek to 
use information from other surveys to conduct this separation, 
such as|Kovacs & Szapudi|(|2015|l. 

Having estimated / ga i, we can now characterise the red¬ 
shift distribution of the extragalactic sources present in the 
point source catalogue. To do so, we measure the set of spa¬ 
tial cross-correlations between all sources in the point source 
ca talog ue with the reference spectroscopic sample introduced 
in l3.ll 

We present the estimated redshift distribution of both 
2MASS extended and point sources with K s < 14 in Figure 


• We first demonstrated the robustness of our clustering 
redshift technique by reproducing the redshift distribu¬ 
tion of sources with K s < 11.75, for which the 2MASS 
Redshift Survey provides complete spectroscopic red¬ 
shift measurements. 

• We measured the redshift distributions of 2MASS ex¬ 
tended sources down to K s = 14.0 as a function of their 
J — H and H-K s colours. A comparison to spectro¬ 
scopic redshifts from SDSS optical data shows excel¬ 
lent agreement. In addition, we presented redshift dis¬ 
tribution estimates for near-infrared dropouts in /, H 
and J&H , for which only limited spectroscopic infor¬ 
mation is available and colour-based techniques tend 
to fail. Finally we combine all these subsamples to 
present an estimate of the global redshift distribution of 
the 2MASS extended catalogue which displays sources 
up to z ~ 0.3. 

• We explored the content of the Point Source Catalogue 
and, based on clustering information, showed that at 
high Galactic latitude about 10% of the sources are ex¬ 
tragalactic down to K s = 14.0. This implies that the 


8 Among the 35 dN /dz estimates for the point source catalogue, one red- 
shift bin at z — 0.28 led to a negative value, which is unphysical. This point 
appears to be an outlier in our entire analysis and might be due to a measure¬ 
ment artefact in the photometric dataset, possibly caused inhomogeneities in 
the on sky source distribution. We decided to exclude it in the final charac¬ 
terization of the redshift distribution. 

9 The data of the full redshift distributions is available at http : //www. 
pha.jhu.edu/~mubdi/2massz 
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point source catalogue contains about 1.6 million ex- 
tragalactic objects, i.e., as many as in the extended 
source catalogue. We presented the redshift distribu¬ 
tion of these sources, which show galaxies extending to 
z ~ 0.7. The Point Source Catalogue therefore provides 
a full-sky sample of extragalactic sources with a cosmic 
volume about ten times larger than that of the extended 
source catalogue. It is therefore of potential interest for 
cosmological experiments such as measurements of the 
Integrated Sachs-Wolfe effect. 

• Finally, adding the contributions from both the ex¬ 
tended and point source catalogues, we presented the 
full redshift distribution of the 2MASS survey which 
showed a relatively smooth distribution continuously 
declining from the lowest redshift probed by our analy¬ 
sis, z = 0.03, up to z ~ 0.7. 

In addition to characterising the 2MASS survey, this analy¬ 
sis illustrates the potential of clustering redshift estimation. 
Previous redshift estimates of 2MASS sources have relied 
on optical observations, which suffer from a selection bias 
present when near-infrared sources are not detected in the op¬ 
tical. In contrast, clustering-redshift estimation can be per¬ 
formed regardless of the optical properties of the sources and 
can directly be used to infer the redshift distribution of the 
entire 2MASS dataset. When possible, we have presented di¬ 
rect comparisons to spectroscopic redshift distributions which 
show excellent agreement, hence validating our method and 
its implementation. Our redshift distribution estimates as a 
function of near-infrared colours, valid for the entire sky, are 
made available through a webpage. This work further demon¬ 
strates the power and potential of the clustering redshift tech¬ 
nique. 
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